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Abstract 

A fast, reliable way of predicting aerodynamic coefficients is produced using a 
neural network optimized by a genetic algorithm. Basic aerodynamic coefficients 
(e.g. lift, drag, pitching moment) are modelled as functions of angle of attack and 
Mach number. The neural network is first trained on a relatively rich set of data 
from wind tunnel tests of numerical simulations to learn an overall model. Most of 
the aerodynamic parameters can be well-fitted using polynomial functions. A new 
set of data, which can be relatively sparse, is then supplied to the network to pro- 
duce a new model consistent with the previous model and the new data. Because 
the new model interpolates realistically between the sparse test data points, it is 
suit able fo r use in piloted s imulatio ns. The g enetic algorit h m is use d to choose a 
neural network architecture to give best results, avoiding over- and under-fitting of 
the test data. 


1 Introduction 

Wind tunnels use scaled models to characterize aerodynamic coefficients. The 
wind tunnel data, in original form, are unsuitable for use in piloted simulations 
because data obtained in different wind tunnels with different scale models of the 
same vehicle are not always consistent. Also, measurements of the same coeffi- 
cient from two different wind tunnels are usually taken at dissimilar values of the 
aerodynamic controls (angle of attack, sideslip, etc.), and some means of reconcil- 


ing the two dissimilar sets of raw data is needed. 

Fitting a smooth function through the wind tunnel data results in smooth deriva- 
tives of the data. The smooth derivatives are important in performing stability 
analysis. Traditionally, the approach considered to describe the aerodynamics of 
the vehicle included developing, wherever possible, a polynomial description of 
each aerodynamic function [3]. This ensured a smooth continuous function and 
removed some of the scatter in the wind tunnel data. This curve fitting procedure 
is unnecessary if the number of coefficients is small. The curve fitting method used 
to generate the parameters for each polynomial description is an unweighted least 
squares algorithm. For the most part, the polynomial equations are generated using 
sparse data from wind tunnel experiments. Because the data is sparse, linear func- 
tions are usually employed. When more data are available, flight control system 
designs will need to be revisited to allow for minor nonlinearities in control effects. 

An aerodynamic model can be developed from wind tunnel data or by numerical 
simulation. Wind tunnel testing can be slow and costly due to high personnel over- 
head and intensive power utilization. Although manual curve fitting can be done, 
it is highly efficient to use a neural network [4, 19, 21] to describe the complex 
relationships between variables. Numerical simulation of complex vehicles, on the 
wide range of conditions required for flight simulation, requires static and dynamic 
data. Static data at low Mach numbers and angles of attack may be obtained with 
simpler Euler codes. Static data for stalled vehicles where zones of flow separation 
are present (usually at higher angles of attack) require Navier-Stokes simulations, 
which are costly due to the large processing time required to attain convergence. 
Preliminary dynamic data may be obtained with simpler methods based on corre- 
lations and vortex methods [2] ; however, accurate prediction of the dynamic coef- 
ficients requires complex and costly numerical simulations [20]. 

This paper is organized as follows; A short introduction to the neural network 
followed by a section that will introduce the need for optimizing the neural net- 
work. The following section will discuss the aerodynamic data set. Then we dis- 
cuss the results (finding an optimal solution for the various aerodynamic coeffi- 
cients)TThefirial section concludes by discussing the benefits of the G A- optimized 
neural network, initial results and future research directions. 

2 Neural Network 

A neural network is conceptually comprised of a collection of nodes and weighted 
connections [12, 17, 23]. The initial connection weights are simply random num- 
bers, which change during training. Training consists of presenting actual exper- 
imental data to the neural network and using a mathematical algorithm (in this 
case, the backpropagation algorithm) to adjust the weights. By presenting a suf- 
ficient number of input-output pairs, the network can be trained to approximate a 
function. 


Among the many neural network models, the backpropagation algorithm is one 
of the better known and frequently used. Backpropagation [23] is a generalization 
of the Widrow-Hoff learning rule [25] to multiple-layer networks and nonlinear 
differentiable transfer functions. The nodes are arranged in layers, with an input 
layer and an output layer representing the independent and dependent variable of 
the function to be learned, and one or more ’’hidden” layers. Backpropagation was 
the first practical method for training a multiple-layer feed forward network. Train- 
ing consists of presenting actual experimental data to the neural network and using 
a mathematical algorithm - the back propagation algorithm - to adjust the weights. 
Each training (input-output) pair of patterns goes through two stages of activation: 
a forward pass and a backward pass. The forward pass involves presenting a sam- 
ple input to the network and letting the activations (i.e. node outputs) propagate to 
the output layer. During the backward pass, the network’s actual output (from the 
forward pass) is compared with the target output and errors are computed for the 
output units. Adjustments of weights are based on the difference between the cor- 
rect and computed outputs. The weight adjustments are propagated from the output 
layer back through previous layers until all have been adjusted (hence ’’backprop- 
agation”). 

FeedForward 

Apply an input; evaluate the activations aj and store the error delta j at each node j: 

aj = sumi(Wij(t)If) 

A$=9(*J ) 

delta j = A] If 

After each training pattern I p is presented, the correction to apply to the weights 
is proportional to the error. The correction is calculated before the thresholding 
step, using err (p) — T p Wijl p . 

Bac kpropa gation 

Compute the adjustments and update the weights. 

Wij(t + J) = Wij(t) eta.deltai.l J where 0 eta 1 is a parameter that 

controls the learning rate. 

Wij = weight from input i to j in output layer; Wj is the vector of all the 
weights of the j th neuron in the output layer. 

I p = input vector (pattern p) = (/f , Jf , ..., 1%) 

T p = target output vector (pattern p) = (Tf , If , ..., Tg) 

A p = Actual output vector (pattern p) = (Af , Af , ..., A p ) 
g() = sigmoid activation function : g(a) = [1 -j- exp( a ) J 


l 



Each training presentation of the entire set of input-output pairs is called a train- 
ing ’’epoch”. In general, many epochs of training are required and the error mag- 
nitude decreases as training proceeds. Once the errors between the intended and 
actual outputs are within the specified tolerance, training is stopped and the neu- 
ral network is ready for use: given a new input observation, it will estimate what 
the corresponding output values should be. After extensive training, the network 
establishes the input-output relationships through the adjusted weights on the net- 
work. 

The backpropagation procedure requires that the node transfer functions be dif- 
ferentiable, but importantly, it does not require that they be linear. Typically a 
hidden layer transfer function is chosen to be nonlinear, allowing extremely com- 
plicated relationships to be learned by the network. 

3 Need for Neural Network Optimization 

The problem of neural network design comes down to searching for an architec- 
ture that performs best on some specified task according to explicit performance 
criteria. This process, in turn, can be viewed as searching the surface defined by 
levels of trained network performance above the space of possible neural network 
architectures. Since the number of possible hidden neurons and connections is 
unbounded, the surface is infinitely large. Since changes in the number of hidden 
neurons or connections must be discrete, and can have a discontinuous effect on 
the network’s performance, the surface is undifferentiable. The mapping from net- 
work design to network performance after learning is indirect, strongly epistatic, 
and dependent on initial conditions (e.g. random weights), so the surface is com- 
plex and noisy [18]. Structurally similar networks can show very different infor- 
mation processing capabilities, so the surface is deceptive; conversely, structurally 
dissimilar networks can show very similar capabilities, so the surface is multi- 
modal. Hence we seek an automated method for searching the vast, undifferen- 
tiable, epistatic, complex, noisy, deceptive, multimodal surface. 

The number of nodes on the hidden layer determines a network’s ability to learn 
the intended function from the training data and to generalize it to new data. If 
a neural network has too many hidden neurons, it will almost exactly learn, or 
memorize, the training examples, but it will not perform well in recognizing new 
data after the training process is complete. If a neural network has too few hidden 
neurons, it will have insufficient memory capacity to learn a complicated function 
represented by the training examples, i.e. the data will be under-fitted. Training 
can also be impeded by noise and outliers in the training data. Better convergence 
can be obtained by simply discarding some training samples, but clearly, this must 
not be overdone or the correct function will not be learned. 


A genetic algorithm is used to optimize the minimum number of training data 



sets required to train the neural network and the minimum number of hidden neu- 
rons in a three layer neural network architecture. The objective of the genetic algo- 
rithm is to eliminate training cases that make it difficult for a neural network to 
converge to the correct output and to avoid discarding data [9, 10, 13, 14]. The 
fitness function used for the genetic algorithm is chosen to satisfy the conflicting 
requirements of training-data size reduction. The fitness function for our genetic 
algorithm performs the following calculations for each chromosome in the popu- 
lation: 


Count the number of hidden neurons. 

Count the number of inputs ignored. 

Train the neural network for 500 learning cycles. (Beyond this point, the 
convergence of the neural network is not very significant.) Sum the training error 
for the last 40 cycles, to obtain an estimate of overall error in the trained network. 

Calculate the fitness value for a chromosome based on cumulative learning 
error, the number of inputs that are ignored, and the number of hidden layer neu- 
rons. 

The fitness function should minimize the training error, the number of hidden 
neurons and the number of inputs that are ignored (i.e., avoids discarding training 
cases except when absolutely necessary). In order to optimize the structure of the 
neural network using a genetic algorithm, a chromosome is encoded using infor- 
mation from input as well hidden neurons. We chose to use at least 15 neurons, 
and this value can be encoded in four bits. At least one bit in the chromosome rep- 
resents information from the input neuron. When a fit chromosome is found, that 
chromosome is used to specify the number of hidden layer neurons. 

4 Genetic Algorithm 

The basic genetic algorithm comprises four important steps [see [6]] : initializa- 
tion, evaluation, exploitation (or selection), and exploration. 

The first step is the creation of the initial population of chromosomes either 
randomly or by perturbing an input chromosome. How the initialization is done is 
not critical as long as the initial population spans a wide range of variable settings 
(i.e., has a diverse population). Thus, if explicit knowledge about the system being 
optimized is available that information can be included in the initial population. 

In the second step, the chromosomes are evaluated and their fitness func- 
tions are computed. The goal of the fitness function is to numerically encode the 
performance of the chromosome. For this problem of optimization, the choice of 
fitness function is the most critical step. 

The third step is the exploitation or natural selection step. In this step, the 
chromosomes with the largest fitness scores are placed one or more times into a 
mating subset in a semi-random fashion. Chromosomes with low fitness scores are 
removed from the population. There are several methods for performing exploita- 



tion. In the binary tournament mating selection method, each chromosome in the 
population competes for a position in the mating subset. Two chromosomes are 
drawn at random from the population, the chromosome with the highest fitness 
score is placed in the mating subset. Both chromosomes are returned to the pop** 
ulation and another tournament begins. This procedure continues until the mating 
subset is full. A characteristic of this scheme is that the worst chromosome in the 
population will never be selected for inclusion in the mating subset. 

The fourth step, exploration, consists of recombination and mutation oper- 
ators. Two chromosomes (parents) from the mating subset are randomly selected 
to be mated. The probability that these chromosomes are recombined (mated) is 
a user-controlled option and is usually set to a high value (e.g., 0.95). If the par- 
ents are allowed to mate, a recombination operator is employed to exchange genes 
between the two parents to produce two children. If they are not allowed to mate, 
the parents are placed into the next generation unchanged. The two most com- 
mon recombination operators are the one-point and two-point crossover methods. 
In the one-point method, a crossover point is selected along the chromosome and 
the genes up to that point are swapped between the two parents. In the two-point 
method, two crossover points are selected and the genes between the two points 
are swapped. The children then replace the parents in the next generation. A third 
recombination operator, which has recently become quite popular, is the uniform 
crossover method. In this method, recombination is applied to the individual genes 
in the chromosome. If crossover is performed, the genes between the parents are 
swapped and if no crossover is performed the genes are left intact. This crossover 
method has a higher probability of producing children that are very different than 
their parents, so the probability of recombination is usually set to a low value (i.e. 
0.1). The probability that a mutation will occur is another user-controlled option 
and is usually set to a low value (e.g., 0.01) so that good chromosomes are not 
destroyed. A mutation simply changes the value for a particular gene. 

After the exploration step, the population is full of newly created chromosomes 
(children) and steps two through four are repeated. This process continues for a 
fixed number of generations. For this application, the most widely used binary 
coded GA is used for encoding genes. In binary coding each chromosome is com- 
pnsed ofzeroes andmes whef^each bit represents agenerToibrmulate the chro- 
mosome for optimization, the bit string is concatenated with the bit strings from 
the other variables to form one long binary string. We adopted a binary coding 
mechanism for creating the chromosomes. In the next section, we will discuss the 
data set required for the genetic algorithm optimized neural network. 

5 Data Set for Aerodynamic models 

Aerodynamic control systems can be divided into two categories viz., control sur- 
faces and aerodynamics controls. In this paper, aerodynamic controls and models 
are the focus. The variables involved in aerodynamic controls are angle of attack 
( ), sideslip angle ( ), elevon deflections ( e), aileron deflections ( a), rudder 



deflection ( R ), speed brake deflection ( SB), landing gear effects, and ground 
effects. The general equations of forces (lb) and moments (ft-lb) for key parame- 
ters are listed in the following tables 1 and 2 [3]. 


Table 1: Aerodynamic Forces. 


Forces (lb) 

Model 

Lift 

L = CL.q.S 

Drag 

D = CD.q.S 

Side-force 

FY = CY.q.s 


Table 2: Aerodynamic Moments. 


Moments (ft-lb) 

Model 

Pitching 

PM — Cm.q.S.c + ( L.cos + D.sin ).Xmrc 
+(L. sin D.cos )-Zmrc 

Rolling 

RM = Cl.q.S.b + FY.Zmrc 

Yawing 

YM = C n .q.S.b + FY.X MR c 


The aerodynamic coefficients involved in the above equations are presented. 

Longitudinal aerodynamic coefficients 
Lift Coefficient CL: 

CL = CLbas( ,M) + ACL, flaps ( F) + ACLspeedbrake( , SB) -f 
A CL lg { LG) + ACL ge ^+ACL,q( ,M).q.^+A, ,( ,M). /.^ 

Drag Coefficient CD: 

CD — CDbas( ,M) + ACD, flaps ( F) + ACDspeedbrake(. , SB) + 

A CD lg ( LG) + ACD ge j + ACD,q( ,M).q.~ 

Pitching Moment Coefficient Cm: 

Cm = CmBAs( ,M) + ACm, flaps ( F) + ACLmspEEDBRAKE( , SB) + 





h c 

ACtulgI LG) + ACm ge j + ACm, q( ,M).q.-^j+ A, ,( ,M). 

Lateral aerodynamic coefficients 
Side force Coefficient CY: 

CY = CYsb{ ,M). + A CY, rudder ( R) + A CY aileron( A) A -\- 
ACYlg* ( LG) + ACY ge A \ + ACY P ( ).p.^ + AGY r ( ).r.A 

Rolling Moment Coefficient Cl: 

Cl = CIsb{ j M). + A Cl, rudder ( R) + AC7 aileron{ A) A + 
AC/x.ga ( LG) + ACl ge & — -f A Cl v ( ).p.— + ACl r { )*r .- Jj 

Yawing Moment Coefficient Cn: 

Cn - Ctisb( , M). + ACn, rudder ( R) + A Cn aileron( A) A + 

ACulgca ( LG) + ACn ge A +A Cn p ( ACn r ( ).r.- — 

The above equations depend basically on angle of attack and Mach number 
with small increments of other factors. The above equation can be expressed as a 
function of angle of attack and Mach number and it resembles a simple polyno- 
mial expression. Depending on the geometry and mass properties of the vehicle, 
aerodynamic coefficients will vary. The general parameters are tabulated in table 3. 


Table 3: Range of values involved in aerodynamic coefficients. 


Parameters 

Ranges of values 

Angle of attaclc(degrees) 

10 < <50 .. 

Side angle (degrees) 

20 < <20 

Mach number 

M 0.9 

Surface deflection (degrees) 

15 < elevons (flaps) < 15 
20 < rudder < 20 
20 < ailerons < 20 
0 < speedbrake <80 j 


Inputs considered for determining base coefficients are angle of attack and Mach 
number. The outputs of the neural network are the coefficients of the aerodynamic 



model. As a good training data set for a particular vehicle type, geometry and mass 
are selected from any wind tunnel test. Sometimes if the data set is not available 
from wind tunnel experiments, a good training data set can be derived from numer- 
ical computations from Euler or Navier-Stokes or Vortex lattice methods. This 
data set consists of a comprehensive input and output tuple for an entire parameter 
space. 

Once the training data set is defined, sparse data collected from experiments can 
be interpolated and extended for the entire range of data using a trained neural 
network (provided the trained data range and sparse data range are similar). This 
will avoid repeating the entire experiment in the wind tunnel. Once the training 
data set is selected, one must determine the type of neural network architecture 
and transfer functions that will be used to interpolate the sparse data. The next 
section will discuss the selection procedure of the neural network architecture and 
transfer functions used in this work. 

6 Neural Network Architecture 

In this paper, interpolating for coefficient of lift CL is discussed for a sparse data 
set. (The rest of the various aerodynamic coefficients will be repeated with the 
same neural network architecture with respect to the corresponding data set.) The 
problem of defining neural network architectures [8] can be divided into the fol- 
lowing categories: (i) type of neural network (whether three layer or four layer, 
etc.); (ii) number of hidden neurons; (iii) type of transfer functions [5]; (iv) train- 
ing algorithm; and (v) validation of neural network output, e.g. testing for over- 
and under-fitting of the results. 

If the function consists of a finite number of points, a three layer neural network 
is capable of learning the function. Additional layers add unnecessary degrees of 
freedom which may cause the network to over- fit sparse data. Since the availability 
of data is limited, the type of neural network considered for this problem is a three 
layer neural network, i.e. input layer, one hidden layer, and output layer. The input 
layer will have two input neurons (alpha and Mach number) and the output layer 
wi ll contain a single neuron (coefficient of lift). The data domain has specific def- 
inite bounds. The number of hidden neurons is to be chosen based on the efficient 
fitting of the data. 

For determining an appropriate (hopefully optimal or near-optimal) number of 
hidden units [15], we construct a sequence of networks with increasing number of 
hidden neurons from 2 to 20. More than 20 hidden neurons cause an over fitting of 
the results [16]. Each neuron in the network is fully connected and uses all avail- 
able input variables. First, a network with a small number of hidden units is trained 
using random initial weights. Iteratively, a larger network is constructed (up to the 
20 hidden neurons) and the network results are compared with the expected results. 

Activation functions also play a key role in producing the best network results. 



The transfer function is a nonlinear function that when applied to the net input 
of a neuron (i.e. to the weighted sum of its connnection inputs), determines the 
output of the neuron. To get a best fit and characterize physical characteristics of 
the problem, it is suggested to use different kinds of transfer functions for differ- 
ent layers of the network. The majority of neural networks use a sigmoid function 
(S-shaped). A sigmoid function is defined as a continuous real- valued function 
whose domain is the reals, whose derivative is always positive, and whose range 
is bounded. In this aerodynamic problem, a sigmoid function can produce an effi- 
cient fit. However, functions such as “tanh” that produce both positive and negative 
values tend to yield faster training than functions that produce only positive values 
such as sigmoid, because of better numerical conditioning. Numerical condition 
affects the speed and accuracy of most numerical algorithms. Numerical condition 
is especially important in the study of neural networks because ill-conditioning is 
a common cause of slow and inaccurate results from backprop-type algorithms. 

The transfer functions for the hidden units are chosen to be nonlinear. (Were 
they to be linear, the network could realize only linear functions. Because a linear 
function of linear functions is again a linear function, there would be no value in 
having a multi-layer network under that condition. It is the capability to to repre- 
sent nonlinear functions that makes multilayer networks so powerful.) Three types 
of activation functions are used in neural networks, namely linear, sigmoid and 
hyperbolic tangent. 

The training epoch is restricted to 1000 cycles of: present a data set, measure 
error, update weights. The learning rate and momentum are selected appropriately 
to get faster convergence of the network. The input and output values are scaled to 
range [0.1, 0.9] to ensure that the output will lie in a region of the nonlinear sig- 
moid transfer function where the derivative is large enough to facilitate training. 
The scaling is performed using the following equation: 

A — r(V Vmin) + Amin 



V ObservedV ariable 
A PresentableV ariable 

Once the scaled training data set is prepared, it is ready for neural network train- 
ing. The Levenberg-Marquardt method [7] for solving the optimization is selected 
for backpropagation training. It is selected due to its guaranteed convergence to a 
local minimum, and its numerical robustness. 



7 Experiments 


The training data set is divided into two sets viz., data set pairs with Mach number 
less than 0.4, and those greater than 0.4. The data set is presented to the neural 
network architecture for the training. Initially a training set which has 233 pairs is 
presented to the neural network up to a user-defined error of tolerance. The weights 
are stored, and a sparse data set of 9 pairs is then provided to the same neural 
network for further training. The initial training data set represents an exhaustive 
combination of data points in the particular parameter space, allowing the network 
to learn the general pattern of a particular aerodynamic coefficient. Based on the 
general pattern, the second training data set is interpolated. 

The initial data set is plotted in figures 1 and 2, and the data in figure 1 can 
be represented by a linear type of function whereas the data in figure 2 can be 
expressed as a combination of linear and hyperbolic tangent or sigmoid functions. 
From numerous trials conducted with different combinations of transfer functions, 
we concluded that the linear transfer function should be adopted for the input- 
to-hidden neurons and hyperbolic tangent or sigmoid function should be used for 
the hidden-to-output layer. Figure 3 represents the sparse data set presented to the 
neural network successively after the initial training data set was presented. The 
figures 4 and 5 represent the neural network predicted data from the sparse data set. 
A few points are over-fitted or under- fitted in the results produced by the network. 
Over- or under- fitting is due to the sparseness of data. Overall the results produced 
by the network are good. 

8 Conclusion 

Neural networks will become an important tool in future NASA Ames efforts to 
move directly from wind tunnel tests to virtual flight simulations. Many errors 
can be eliminated, and implementing a neural network can considerably reduce 
cost. Preliminary results have proven that the neural network is an efficient tool 
to interpolate across sparse data. The prediction for the lower end and upper end 
of Mach number by the neural network is considerably deviated. The deviation is 
caused by non-availability of data in the sparse set. Initially the neural network has 
been trained by original data which enables the network to understand an over- 
all pattern. Successive training by the sparse data alters the weights of the neural 
network which causes this deviation. This deviation is well within 10 %, which 
is acceptable in aerodynamic modeling. Further research is focused to overcome 
this deviation in predicting sparse data. It is also directed to optimize the num- 
ber of hidden neurons and will be integrated into the web-enabled application. A 
hybrid system using evolutionary theory and a neural network is planned to build 
an efficient model to predict aerodynamic variables. The neural network will be an 
integral tool of the data mining suite in an existing collaborative system at NASA. 
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Figure 1: Results from the Neural network: (a)Initial Training data for neural net- 
work (M 0.4), (b) Initial training data for neural network (M > 0.4), 
(c)Sparse data presented to the neural network, (d) Neural network inter- 
polated data for sparse data (M 0.4) (e) Neural network interpolated 
data for sparse data (M > 0.4). 
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